Guided Open Vocabulary Image Captioning with Constrained Beam Search
نویسندگان
چکیده
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-training. Our method uses constrained beam search to force the inclusion of selected tag words in the output, and fixed, pretrained word embeddings to facilitate vocabulary expansion to previously unseen tag words. Using this approach we achieve state of the art results for out-of-domain captioning on MSCOCO (and improved results for in-domain captioning). Perhaps surprisingly, our results significantly outperform approaches that incorporate the same tag predictions into the learning algorithm. We also show that we can significantly improve the quality of generated ImageNet captions by leveraging ground-truth labels.
منابع مشابه
The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning
This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...
متن کاملThe Impact of Residual Geometric Inaccuracies on Normal Organ Doses in Image Guided-Radiation Therapy of Prostate Cancer Using On-Board Kilovoltage Cone-Beam Computed Tomography
Introduction: The aim of this retrospective study was to evaluate the variations in delivered dose to the bladder, rectum, and femoral heads of prostate cancer patients during a course of treatment by image-guided radiation therapy (IGRT). Materials and Methods: Overall, 15 patients with prostate cancer were selected and. Each week, for each patient five consecutive cone beam computed tomograph...
متن کاملText-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملAutomated Image Captioning for Rapid Prototyping and Resource Constrained Environments
Significant performance gains in deep learning coupled with the exponential growth of image and video data on the Internet have resulted in the recent emergence of automated image captioning systems. Ensuring scalability of automated image captioning systems with respect to the ever increasing volume of image and video data is a significant challenge. This paper provides a valuable insight in t...
متن کاملImproved Beam Search with Constrained Softmax for NMT
We propose an improved beam search decoding algorithm with constrained softmax operations for neural machine translation (NMT). NMT is a newly emerging approach to predict the best translation by building a neural network instead of a log-linear model. It has achieved comparable translation quality to the existing phrase-based statistical machine translation systems. However, how to perform eff...
متن کامل